25 March 2019
show where your plots are
show how variables are distributed spatially
Open-source, free
Open-source, free
Workflow and reproducibility
Open-source, free
Workflow and reproducibility
Quite efficient

Geographic (or unprojected) CRS
Projected CRS
What are geographic coordinate systems?
What are projected coordinate systems?
Many, many ways to represent the 3-D shape of the earth and to project it in a 2-D plane
EPSG or a proj4stringThe EPSG code is a numeric representation of a CRS, while the proj4string reprensents the full set of parameters spelled out in a string:
EPSG4326<=> proj4+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
EPSG32188<=> proj4+proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
All geographic files were created using a specific CRS - but it’s not always defined!
To find CRS in any format: Spatial Reference
sf
Why use the sf package when sp is already tried and tested?
Simple features refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers
successor of sp
sf incorporates the functionality of the 3 main packages of the sp paradigm in a single, cohesive whole:
sp for the class system;rgdal for reading and writing data;rgeos for spatial operations undertaken by GEOS.Why using sf while sp is still there and work just fine?
sf objects are easy to manipulate
st_ for easy tab completion%>%)dplyr and tidyr verbs have been defined for the sf objectsggplot2 friendlyMULTIPOINT - sample points of water quality measures in Montreal 1MULTILINESTRING - streams and rivers of Montreal 1MULTIPOLYGON - polygons of land use types 2MULTIPOLYGON - Canadian boundaries (retrieved directly from R)raster - canopy index of Montreal 2raster - altitude (retrieved directly from R)library(sf) # for simple features vector library(lwgeom) # for st_make_valid library(dplyr) # for data manipulation library(reshape2) # for data manipulation library(RColorBrewer) # for color palette library(raster) # for raster data library(mapview) # for interactive map
MULTIPOINTThis dataset defines the localisation of the sampling points for the RUISSO program. The latter consists in analyzing the bacteriological and physicochemical quality of inland streams and watercourses in Montreal.
# Create a new directory to download data
if(!dir.exists("data")) dir.create("data")
# Download csv file from web page in your working directory
if (!file.exists("data/ruisso.csv"))
download.file("http://donnees.ville.montreal.qc.ca/dataset/86843d31-4251-4002-b10d-620aaa751092/resource/adad6c48-fb48-40fc-a031-1ac870d978b4/download/scri03.-infor03.02-informatique03.02.07-donneesouvertesrsmaruissostationsstations_ruisso.csv",
destfile = "data/ruisso.csv")
# Read csv file in R
ruisso <- read.csv("data/ruisso.csv", header = T, dec = ",")
names(ruisso)
## [1] "Plan.d.eau" "Point.d.échantillonnage" ## [3] "Localisation" "Administration" ## [5] "LATITUDE" "LONGITUDE" ## [7] "Activité"
sf MULTIPOINT objectWhen converting from a data.frame, you have to specify the column names or numbers holding coordinates and the CRS, while for sp objects, it’s not required.
ruisso_sf <- st_as_sf(x = ruisso, coords = c("LONGITUDE", "LATITUDE"), crs = 4326)
# the CRS is given in the metadata
head(ruisso_sf, 1)
## Simple feature collection with 1 feature and 5 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -73.93704 ymin: 45.45022 xmax: -73.93704 ymax: 45.45022 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## Plan.d.eau Point.d.échantillonnage ## 1 Rivière à l'Orme AAO-0.0 ## Localisation ## 1 Pierrefonds, boul. Gouin O, 40m au nord de la rue de l'Anse à l'Orme, exutoire au lac des Deux Montagnes. ## Administration Activité geometry ## 1 Pierrefonds-Roxboro Actif POINT (-73.93704 45.45022)
MULTIPOINTInstead of creating a single map, as with sp object, the default plot of sf object creates multiple maps, one for each attribute, which can be useful for exploring the spatial distribution of different variables.
plot(ruisso_sf)
MULTIPOINTTo plot only the geometry and not all attributes, we retrieve the geometry list-column using st_geometry():
plot(st_geometry(ruisso_sf))
This dataset contains the actual water quality measurements at the RUISSO sampling points.
# Download csv file from web page in your working directory
if (!file.exists("data/donnees_ruisso_2016.csv"))
download.file("http://donnees.ville.montreal.qc.ca/dataset/8c149ace-7b2f-4041-99ec-3bdbef5dcee6/resource/38c8eb26-46a1-4fc2-87a0-5c88e94cee8e/download/donnees_ruisso_2016.csv",
destfile = "data/ruisso_data.csv")
# Read csv file in R
ruisso_data <- read.csv("data/ruisso_data.csv", header = T, dec = ",")
head(ruisso_data, 3)
## Point.d.échantillonnage Date Raison.d.annulation X.OD O2..mg.L. ## 1 AAO-0.0 2016/05/12 110 10.7 ## 2 AAO-0.0 2016/06/01 74 7.4 ## 3 AAO-0.0 2016/06/22 72 6.7 ## Conductivité..µs.cm2. pH Température..oC. Signe.COLI COLI MÉTÉO ## 1 514 7.9 16.9 < 10 1 ## 2 1434 7.9 15.0 = 54 1 ## 3 1474 7.7 19.1 = 230 0 ## Ag..µg.L. Al..µg.L. As..µg.L. Ba..µg.L. Be..µg.L. Ca..µg.L. Cd..µg.L. ## 1 0.1 214 0.4 37 0.1 44200 0.1 ## 2 0.1 60 0.5 69 0.1 109000 0.1 ## 3 0.1 761 0.9 58 0.1 104000 0.1 ## Co..µg.L. COT..µg.L. Cr..µg.L. Cu..µg.L. Fe..µg.L. K..µg.L. Mg..µg.L. ## 1 0.2 7.2 0.6 1.9 364 1970 15600 ## 2 0.2 6.4 0.3 3.2 271 4190 37100 ## 3 0.8 6.1 2.2 4.3 1200 4600 39300 ## Mn..µg.L. Mo..µg.L. Na..µg.L. NH3..µg.L. Ni..µg.L. Ptot..µg.L. Pb..µg.L. ## 1 40.4 1.5 46600 30 1.2 37 0.5 ## 2 70.3 4.1 100000 160 2.3 33 0.2 ## 3 82.8 4.1 100000 290 3.0 129 3.2 ## MES...mg.L. Sb..µg.L. Se..µg.L. Sn2.ug.L Tl.ug.L U..µg.L. V..µg.L. ## 1 6.8 0.5 0.5 1 0.2 0.8 0.9 ## 2 5.7 0.5 0.5 1 0.2 1.7 0.6 ## 3 35.3 0.5 0.5 1 0.2 1.8 2.8 ## Zn..µg.L. ## 1 7 ## 2 7 ## 3 12

Let’s now combine the MULTIPOINT object of sample coordinates to the water quality measurements using left_join() from dplyr, based on a shared ‘key’ variable.
See dplyr and tidyr cheatsheet for other useful functions.
ruisso_sf <- left_join(ruisso_sf, ruisso_data, by = "Point.d.échantillonnage") names(ruisso_sf)
## [1] "Plan.d.eau" "Point.d.échantillonnage" ## [3] "Localisation" "Administration" ## [5] "Activité" "Date" ## [7] "Raison.d.annulation" "X.OD" ## [9] "O2..mg.L." "Conductivité..µs.cm2." ## [11] "pH" "Température..oC." ## [13] "Signe.COLI" "COLI" ## [15] "MÉTÉO" "Ag..µg.L." ## [17] "Al..µg.L." "As..µg.L." ## [19] "Ba..µg.L." "Be..µg.L." ## [21] "Ca..µg.L." "Cd..µg.L." ## [23] "Co..µg.L." "COT..µg.L." ## [25] "Cr..µg.L." "Cu..µg.L." ## [27] "Fe..µg.L." "K..µg.L." ## [29] "Mg..µg.L." "Mn..µg.L." ## [31] "Mo..µg.L." "Na..µg.L." ## [33] "NH3..µg.L." "Ni..µg.L." ## [35] "Ptot..µg.L." "Pb..µg.L." ## [37] "MES...mg.L." "Sb..µg.L." ## [39] "Se..µg.L." "Sn2.ug.L" ## [41] "Tl.ug.L" "U..µg.L." ## [43] "V..µg.L." "Zn..µg.L." ## [45] "geometry"
Keep only active sites
ruisso_sf1 <- filter(ruisso_sf, Activité == "Actif") # same as # ruisso_sf1 <- filter(ruisso_sf, Activité != "Inactif") dim(ruisso_sf)
## [1] 361 45
dim(ruisso_sf1)
## [1] 345 45
Remove unwanted variables
ruisso_sf2 <- dplyr::select(ruisso_sf1,
-Localisation,
-Activité,
-Raison.d.annulation,
-Signe.COLI,
-MÉTÉO)
Note:
rasteranddplyrpackages have a functionselect(). To avoid an error message when both packages are loaded, we explicitly use the namespace of the package :dplyr::select().
names(ruisso_sf2)
## [1] "Plan.d.eau" "Point.d.échantillonnage" ## [3] "Administration" "Date" ## [5] "X.OD" "O2..mg.L." ## [7] "Conductivité..µs.cm2." "pH" ## [9] "Température..oC." "COLI" ## [11] "Ag..µg.L." "Al..µg.L." ## [13] "As..µg.L." "Ba..µg.L." ## [15] "Be..µg.L." "Ca..µg.L." ## [17] "Cd..µg.L." "Co..µg.L." ## [19] "COT..µg.L." "Cr..µg.L." ## [21] "Cu..µg.L." "Fe..µg.L." ## [23] "K..µg.L." "Mg..µg.L." ## [25] "Mn..µg.L." "Mo..µg.L." ## [27] "Na..µg.L." "NH3..µg.L." ## [29] "Ni..µg.L." "Ptot..µg.L." ## [31] "Pb..µg.L." "MES...mg.L." ## [33] "Sb..µg.L." "Se..µg.L." ## [35] "Sn2.ug.L" "Tl.ug.L" ## [37] "U..µg.L." "V..µg.L." ## [39] "Zn..µg.L." "geometry"
Rename variables
ruisso_sf3 <- rename(ruisso_sf2, river = Plan.d.eau, sample_pts = Point.d.échantillonnage, dissolve_O = X.OD) names(ruisso_sf3)
## [1] "river" "sample_pts" ## [3] "Administration" "Date" ## [5] "dissolve_O" "O2..mg.L." ## [7] "Conductivité..µs.cm2." "pH" ## [9] "Température..oC." "COLI" ## [11] "Ag..µg.L." "Al..µg.L." ## [13] "As..µg.L." "Ba..µg.L." ## [15] "Be..µg.L." "Ca..µg.L." ## [17] "Cd..µg.L." "Co..µg.L." ## [19] "COT..µg.L." "Cr..µg.L." ## [21] "Cu..µg.L." "Fe..µg.L." ## [23] "K..µg.L." "Mg..µg.L." ## [25] "Mn..µg.L." "Mo..µg.L." ## [27] "Na..µg.L." "NH3..µg.L." ## [29] "Ni..µg.L." "Ptot..µg.L." ## [31] "Pb..µg.L." "MES...mg.L." ## [33] "Sb..µg.L." "Se..µg.L." ## [35] "Sn2.ug.L" "Tl.ug.L" ## [37] "U..µg.L." "V..µg.L." ## [39] "Zn..µg.L." "geometry"
Remove symbols from column names:
names(ruisso_sf3) <- gsub("\\..*", "", names(ruisso_sf3))
names(ruisso_sf3)
## [1] "river" "sample_pts" "Administration" "Date" ## [5] "dissolve_O" "O2" "Conductivité" "pH" ## [9] "Température" "COLI" "Ag" "Al" ## [13] "As" "Ba" "Be" "Ca" ## [17] "Cd" "Co" "COT" "Cr" ## [21] "Cu" "Fe" "K" "Mg" ## [25] "Mn" "Mo" "Na" "NH3" ## [29] "Ni" "Ptot" "Pb" "MES" ## [33] "Sb" "Se" "Sn2" "Tl" ## [37] "U" "V" "Zn" "geometry"
Remove accents from column names
names(ruisso_sf3) <- gsub("é", "e", names(ruisso_sf3))
names(ruisso_sf3)
## [1] "river" "sample_pts" "Administration" "Date" ## [5] "dissolve_O" "O2" "Conductivite" "pH" ## [9] "Temperature" "COLI" "Ag" "Al" ## [13] "As" "Ba" "Be" "Ca" ## [17] "Cd" "Co" "COT" "Cr" ## [21] "Cu" "Fe" "K" "Mg" ## [25] "Mn" "Mo" "Na" "NH3" ## [29] "Ni" "Ptot" "Pb" "MES" ## [33] "Sb" "Se" "Sn2" "Tl" ## [37] "U" "V" "Zn" "geometry"
There are multiple measurements during the summer.
ruisso_sf3
## Simple feature collection with 345 features and 39 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -73.93704 ymin: 45.42516 xmax: -73.50467 ymax: 45.69734 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## river sample_pts Administration Date dissolve_O ## 1 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/05/12 110.0 ## 2 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/06/01 74.0 ## 3 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/06/22 72.0 ## 4 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/08/02 79.0 ## 5 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/08/22 79.0 ## 6 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/09/20 76.8 ## 7 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/10/25 96.9 ## 8 Rivière à l'Orme AAO-3.3P6 Kirkland 2016/05/12 110.0 ## 9 Rivière à l'Orme AAO-3.3P6 Kirkland 2016/06/01 81.0 ## 10 Rivière à l'Orme AAO-3.3P6 Kirkland 2016/06/22 93.0 ## O2 Conductivite pH Temperature COLI Ag Al As Ba Be Ca Cd ## 1 10.7 514 7.9 16.9 10 0.1 214 0.4 37 0.1 44200 0.1 ## 2 7.4 1434 7.9 15.0 54 0.1 60 0.5 69 0.1 109000 0.1 ## 3 6.7 1474 7.7 19.1 230 0.1 761 0.9 58 0.1 104000 0.1 ## 4 6.8 1346 7.9 22.3 300 0.1 214 0.8 67 0.1 98200 0.1 ## 5 7.3 793 7.8 19.1 550 0.1 353 0.5 56 0.1 71200 0.1 ## 6 7.1 1286 7.7 18.8 72 0.1 207 0.5 61 0.1 102000 0.1 ## 7 11.0 1414 7.7 9.4 99 0.1 259 0.4 77 0.1 122000 0.1 ## 8 11.6 1589 7.9 12.9 1200 0.1 81 0.3 60 0.1 125000 0.1 ## 9 8.1 1639 8.0 15.6 4400 0.1 95 0.4 56 0.1 123000 0.1 ## 10 9.2 897 7.9 15.6 33000 0.1 2600 1.1 63 0.1 72700 0.1 ## Co COT Cr Cu Fe K Mg Mn Mo Na NH3 Ni Ptot Pb ## 1 0.2 7.2 0.6 1.9 364 1970 15600 40.4 1.5 46600 30 1.2 37 0.5 ## 2 0.2 6.4 0.3 3.2 271 4190 37100 70.3 4.1 100000 160 2.3 33 0.2 ## 3 0.8 6.1 2.2 4.3 1200 4600 39300 82.8 4.1 100000 290 3.0 129 3.2 ## 4 0.3 15.1 0.5 1.6 393 4180 36000 65.1 3.9 100000 90 2.0 67 0.9 ## 5 0.4 4.3 1.3 3.0 533 3150 19500 44.2 2.9 58200 120 2.3 55 0.8 ## 6 0.2 5.3 0.7 3.2 367 4100 33700 18.9 3.3 100000 40 2.7 31 0.7 ## 7 0.3 4.2 0.9 3.1 422 5590 35900 24.7 3.8 100000 70 2.4 36 0.6 ## 8 0.2 3.8 0.3 3.3 237 4070 36300 57.2 2.3 100000 60 2.0 23 0.2 ## 9 0.2 3.7 0.3 8.0 258 4840 35400 68.3 3.1 100000 100 2.1 207 0.2 ## 10 2.7 23.6 9.9 30.0 3950 4790 20900 261.0 2.6 92300 490 7.6 430 5.6 ## MES Sb Se Sn2 Tl U V Zn geometry ## 1 6.8 0.5 0.5 1 0.2 0.8 0.9 7 POINT (-73.93704 45.45022) ## 2 5.7 0.5 0.5 1 0.2 1.7 0.6 7 POINT (-73.93704 45.45022) ## 3 35.3 0.5 0.5 1 0.2 1.8 2.8 12 POINT (-73.93704 45.45022) ## 4 9.0 0.5 0.5 1 0.2 1.8 1.7 7 POINT (-73.93704 45.45022) ## 5 10.3 0.5 0.5 1 0.2 1.2 1.7 7 POINT (-73.93704 45.45022) ## 6 3.6 0.5 0.5 1 0.2 1.3 1.2 7 POINT (-73.93704 45.45022) ## 7 5.8 0.5 0.5 1 0.2 2.2 1.2 7 POINT (-73.93704 45.45022) ## 8 3.4 0.5 0.5 1 0.2 1.9 0.7 7 POINT (-73.90147 45.43689) ## 9 5.0 0.5 0.5 1 0.2 1.6 1.0 7 POINT (-73.90147 45.43689) ## 10 161.0 1.2 0.5 1 0.2 0.9 9.4 108 POINT (-73.90147 45.43689)
We will use the mean of water quality measurements. We could have used the max or min or chosen a single month of interest.
ruisso_mean <- ruisso_sf3 %>% group_by(sample_pts) %>% # split the data into groups of samples summarise_at(vars(O2:Zn), mean, na.rm = TRUE) # mean by group on selected variables ruisso_mean
## Simple feature collection with 50 features and 35 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -73.93704 ymin: 45.42516 xmax: -73.50467 ymax: 45.69734 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## # A tibble: 50 x 36 ## sample_pts O2 Conductivite pH Temperature COLI Ag Al As ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 AAO-0.0 8.14 1180. 7.8 17.2 1.88e2 0.1 295. 0.571 ## 2 AAO-3.3P6 9.5 1378. 7.89 16.4 1.86e4 0.1 432. 0.457 ## 3 AAO-3.5 8.79 1281. 7.74 14.3 6.12e2 0.1 150. 0.629 ## 4 AAO-3.6 9 1128. 7.97 16.1 4.61e2 0.1 217. 0.429 ## 5 AAO-6.4P12 10.1 1461. 7.96 18.2 1.47e2 0.1 33.1 0.243 ## 6 AAO-6.5 7.4 567. 8.12 16.5 1.22e3 0.1 109. 0.7 ## 7 ADM-1 9.97 333. 8.19 21.2 5.86e1 0.1 119 1.11 ## 8 ANG-2 9.7 399. 8.33 20.2 4.10e1 0.1 60.7 1.3 ## 9 BER-0.0 7.18 803. 7.61 17.1 3.05e2 0.1 140. 0.457 ## 10 BER-0.7P1 9.25 751. 7.76 17.1 1.43e3 0.1 75.7 0.529 ## # … with 40 more rows, and 27 more variables: Ba <dbl>, Be <dbl>, ## # Ca <dbl>, Cd <dbl>, Co <dbl>, COT <dbl>, Cr <dbl>, Cu <dbl>, Fe <dbl>, ## # K <dbl>, Mg <dbl>, Mn <dbl>, Mo <dbl>, Na <dbl>, NH3 <dbl>, Ni <dbl>, ## # Ptot <dbl>, Pb <dbl>, MES <dbl>, Sb <dbl>, Se <dbl>, Sn2 <dbl>, ## # Tl <dbl>, U <dbl>, V <dbl>, Zn <dbl>, geometry <POINT [°]>
plot(ruisso_mean)
myPal <- colorRampPalette(c("blue", "red"))
plot(ruisso_mean["Temperature"],
pal = myPal, nbreaks = 30, pch = 19, key.pos = 1,
main = "Water temperature in streams of Montreal")
MULTIPOINT to a ShapefileWe can write a simple features object to a file (e.g. a shapefile) using the st_write() function in sf (see vignette #2)
st_write(ruisso_mean, dsn = "data/ruisso.shp", driver = "ESRI Shapefile", delete_dsn = T)
## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ## ESRI Shapefile driver
Some argument specifications depend on the driver, see
?st_write
MULTIPOLYGONWe will now import a land use vector map of Montreal.
The land use dataset is composed of multiple individual shapefiles, the following manipulations allow to download, unzip all the individual files, combine them and save the resulting file as a GeoPackage. These manipulations can take a few minutes and should have been performed before the workshop using this R script.
Don’t run these lines now!
# Download shapefiles
download.file("http://cmm.qc.ca/fileadmin/user_upload/geomatique/UtilisationDuSol/2016_Shapefiles/660-US-2016.zip", destfile = "data/landuse.zip")
# Unzip the main folder and name it "landuse"
unzip("landuse.zip", exdir="data/landuse")
# get all the zip files inside the main folder "landuse"
zipF <- list.files(path = "data/landuse/", pattern = "*.zip", full.names = TRUE)
# unzip all your files
plyr::ldply(.data = zipF, .fun = unzip, exdir = "data/landuse")
Don’t run these lines now!
# Get the names of the land use shapefiles from the folder "landuse"
shp_files <- list.files(path = "data/landuse", pattern = ".shp")
shp_files <- sub(".shp", "", shp_files)
# Read all the shapefiles
LU <- list()
for(f in shp_files) {
LU[[f]] <- st_read(dsn = "data/landuse/", layer = f)
}
# Combine all shapefiles together
LU_all <- do.call(rbind, LU)
# Write as a GeoPackage
st_write(LU_all, "data/LU_all.gpkg", driver = "GPKG")
# Read GeoPackage in R LU <- st_read(dsn = "data/LU_all.gpkg")
## Reading layer `LU_all' from data source `/Users/mariehbrice/Documents/GitHub/Rspatial/docs/data/LU_all.gpkg' using driver `GPKG' ## Simple feature collection with 107233 features and 38 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 265961.2 ymin: 5027324 xmax: 306814.6 ymax: 5063078 ## epsg (SRID): NA ## proj4string: +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
Notice that the EPSG code is not defined
To define the CRS when it’s missing we can use st_set_crs().
Note however that replacing crs does not re-project data. If we wanted to transform coordinate and re-project them, we would use st_transform() .
LU <- st_set_crs(LU, 32188)
MULTIPOLYGONWe can map only one land use at a time by subsetting first the sf object. UTIL_SOL==1000 represents water. Don’t try to plot everything… it would take forever!
plot(st_geometry(subset(LU, UTIL_SOL==1000)))
Geometry validity refers to essential properties of polygons, such as:
Invalid geometry can be ignored for mapping but cause problem for many spatial operations.
st_is_valid() returns a logical vector indicating for each polygon geometries indicating whether it is topologically valid:
head(invalid_poly <- which(!st_is_valid(LU)))
## [1] 862 1149 1251 1406 2071 2119
Using the argument reason = TRUE returns the reason for invalidity:
st_is_valid(LU[invalid_poly,], reason = TRUE)[1:8]
## [1] "Nested shells[295499.0604 5045532.3412]" ## [2] "Nested shells[298395.5742 5045168.6803]" ## [3] "Self-intersection[297511.205 5032954.7058]" ## [4] "Nested shells[273102.34356 5035592.911633]" ## [5] "Nested shells[274053.456247 5036041.908298]" ## [6] "Nested shells[288581.300144 5036823.875472]" ## [7] "Self-intersection[294414.9683 5031596.2768]" ## [8] "Self-intersection[295265.9577 5045076.9297]"
par(mfrow = c(1,2)) plot(st_geometry(LU[862,]), main = "Nested shells") plot(st_geometry(LU[1251,]), main = "Self-intersection")
We can use st_make_valid() from lwgeom to make an invalid geometry valid
LU_val <- st_make_valid(LU)
# let's verify if all geometries are now valid which(!st_is_valid(LU_val)) # yeah!
## integer(0)

st_read(dsn = path_to_file, layer = file_name, driver = "ESRI Shapefile") and name it courseau
If you did not load the shapefile before the workshop you can download it using
download.file()from this link : http://donnees.ville.montreal.qc.ca/dataset/c128aff5-325c-4599-ab66-1c9d0b3abc94/resource/a37e11d4-f0a3-46a7-8636-76754fad72b3/download/courseau.zip
# Download shapefile from web page in your working directory
if (!file.exists("data/courseau.zip"))
download.file("http://donnees.ville.montreal.qc.ca/dataset/c128aff5-325c-4599-ab66-1c9d0b3abc94/resource/a37e11d4-f0a3-46a7-8636-76754fad72b3/download/courseau.zip", destfile = "data/courseau.zip")
# Unzip the main folder
unzip("data/courseau.zip", exdir = "data/courseau")
# For shapefiles, dsn may be the character string holding the geojson data
courseau <- st_read(dsn = "data/courseau/", layer = "courseau", quiet = T)
courseau
## Simple feature collection with 1306 features and 5 fields ## geometry type: MULTILINESTRING ## dimension: XY ## bbox: xmin: -73.97268 ymin: 45.41593 xmax: -73.4975 ymax: 45.69939 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## OBJECTID_1 NOM TYPE Shape_Le_1 NuméroRui ## 1 1 rivière à l'Orme rivière 177.95299 <NA> ## 2 2 rivière à l'Orme rivière 128.51146 <NA> ## 3 3 <NA> fossé 172.42988 <NA> ## 4 4 rivière à l'Orme rivière 216.66838 <NA> ## 5 8 <NA> fossé 540.29539 <NA> ## 6 9 <NA> ruisseau 97.66412 <NA> ## 7 10 <NA> ruisseau 162.30584 <NA> ## 8 11 <NA> ruisseau 210.75794 <NA> ## 9 14 <NA> ruisseau 608.28651 <NA> ## 10 16 <NA> fossé 267.61108 <NA> ## geometry ## 1 MULTILINESTRING ((-73.9107 ... ## 2 MULTILINESTRING ((-73.90824... ## 3 MULTILINESTRING ((-73.90667... ## 4 MULTILINESTRING ((-73.91472... ## 5 MULTILINESTRING ((-73.91029... ## 6 MULTILINESTRING ((-73.9206 ... ## 7 MULTILINESTRING ((-73.91969... ## 8 MULTILINESTRING ((-73.91727... ## 9 MULTILINESTRING ((-73.91727... ## 10 MULTILINESTRING ((-73.91212...
par(mar=c(0,0,0,0)) plot(st_geometry(courseau))
par(mar=c(0,0,0,0)) plot(courseau["TYPE"], key.pos = 1, pal = brewer.pal(4, "Paired"))
rasterraster classesraster provides three main classes of raster object:
RasterLayer - a single-layer (variable) raster (e.g. elevation)
RasterStack - one single object several single-layer (variable) rasters stored in one or different files (e.g. Worldclim bioclimatic variables)
RasterBrick one single object several single-layer (variable) rasters stored in one single file (e.g. a single multispectral satellite file)
We now import raster data use a .tif file of a canopy index from Montreal.
# Download tif file from web page in your working directory
if (!file.exists("data/canopee.zip")){
download.file("http://cmm.qc.ca/fileadmin/user_upload/geomatique/IndiceCanopee/2015/660_ Canopee2015_3m.zip", destfile = "data/canopee.zip") }
unzip("data/canopee.zip", exdir = "data")
# Read tif in R using raster
# The file named "660_CLASS_3m.tif" contains the canopy index for all the Montreal area, so we can read this file only
canopee_mtl <- raster("data/660_CLASS_3m.tif")
rastercanopee_mtl
## class : RasterLayer ## dimensions : 35754, 40854, 1460693916 (nrow, ncol, ncell) ## resolution : 1, 1 (x, y) ## extent : 265961, 306815, 5027324, 5063078 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## data source : /Users/mariehbrice/Documents/GitHub/Rspatial/docs/data/660_CLASS_3m.tif ## names : X660_CLASS_3m ## values : 1, 5 (min, max)
The canopy index dataset is a RasterLayer with values from 1 to 5, 35754 pixels by row and 40854 pixels by column and resolution of 1m x 1m.
rasterSimilar to the sf package, raster also provides plot methods for its own classes.
plot(canopee_mtl)
getDataIn package raster, getData() function generates requests to access to different spatial datasets (either raster or sp objects). Argument name specifies the dataset you wish to download.
GADM- global administrative boundaries at different level of administrative subdivision
worldclim- global interpolated climate data
altandSTRM- coarse and fine resolution elevation data
ISO3- 3 letter ISO codes for country names.
head(getData("ISO3"))
## ISO3 NAME ## 1 AFG Afghanistan ## 2 XAD Akrotiri and Dhekelia ## 3 ALA Åland ## 4 ALB Albania ## 5 DZA Algeria ## 6 ASM American Samoa
getDataRetrieve a raster of altitude for Canada.
# alt90 <- getData('SRTM', lon = -73.7, lat = 45.5) # Fine resolution
altCAN <- getData(name = "alt", country = "CAN", path = "data/") # Coarse resolution
altCAN
## class : RasterLayer ## dimensions : 4992, 10620, 53015040 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : -141.1, -52.6, 41.6, 83.2 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## data source : /Users/mariehbrice/Documents/GitHub/Rspatial/docs/data/CAN_msk_alt.grd ## names : CAN_msk_alt ## values : -108, 5800 (min, max)
Because the resolution of the altitude raster (0.0083x0.0083° = 650x926m) is a lot coarser than that of the canopy index (1x1m), we will use the altitude raster for examples of raster manipulations to reduce computation time.
getDataRetrieve a raster of altitude for Canada.
plot(altCAN)

getData() to retrieve Canadian boundary map at the provincial level
?getDataRe-project your new object using the same projection that of the land use polygons (st_crs(LU_val) or with the EPSG code 32188)
Try to plot the geometry of the Quebec polygon.
Try qc_simple_prj <- st_simplify(qc_prj, dTolerance = 500, preserveTopology = F) and plot the geometry of this new object. Is there a difference?
# get canada boundary map
can <- raster::getData("GADM", country = "CAN", level = 1, path = "data/")
# convert to sf
can <- st_as_sf(can)
# Subset Quebec
qc <- can[can$NAME_1 == "Québec",]
# transform projection
qc_prj <- st_transform(qc, 32188)
# simplify the polygons so it's lighter
qc_simple_prj <- st_simplify(qc_prj, dTolerance = 500, preserveTopology = F)
plot(st_geometry(qc_simple_prj))
sfruisso_buf <- st_buffer(st_geometry(ruisso_mean), dist = 0.01)
## Warning in st_buffer.sfc(st_geometry(ruisso_mean), dist = 0.01): st_buffer ## does not correctly buffer longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).
dist() is assumed to be in decimal degreesThis message indicates that sf assumes a distance value is given in degrees
st_buffer() does not correctly buffer longitude/latitude dataThis warning indicates that the result may be incorrect becausest_bufferexpects coordinates in a 2-D, Euclidean space, which is not the case for longitude latitude data. So we should re-project the data onto a projected CRS.
plot(st_geometry(ruisso_mean), pch = 19, cex= .8) plot(ruisso_buf, add = T, border= "blue3")
st_transform()We can re-project the sample points using a suitable projection that has units of meters. To do this, we will use the projection from our land use polygons.
(ruisso_proj <- st_transform(ruisso_mean, crs = st_crs(LU_val)))
## Simple feature collection with 50 features and 35 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: 270615.3 ymin: 5031758 xmax: 304436.2 ymax: 5061940 ## epsg (SRID): 32188 ## proj4string: +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## # A tibble: 50 x 36 ## sample_pts O2 Conductivite pH Temperature COLI Ag Al As ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 AAO-0.0 8.14 1180. 7.8 17.2 1.88e2 0.1 295. 0.571 ## 2 AAO-3.3P6 9.5 1378. 7.89 16.4 1.86e4 0.1 432. 0.457 ## 3 AAO-3.5 8.79 1281. 7.74 14.3 6.12e2 0.1 150. 0.629 ## 4 AAO-3.6 9 1128. 7.97 16.1 4.61e2 0.1 217. 0.429 ## 5 AAO-6.4P12 10.1 1461. 7.96 18.2 1.47e2 0.1 33.1 0.243 ## 6 AAO-6.5 7.4 567. 8.12 16.5 1.22e3 0.1 109. 0.7 ## 7 ADM-1 9.97 333. 8.19 21.2 5.86e1 0.1 119 1.11 ## 8 ANG-2 9.7 399. 8.33 20.2 4.10e1 0.1 60.7 1.3 ## 9 BER-0.0 7.18 803. 7.61 17.1 3.05e2 0.1 140. 0.457 ## 10 BER-0.7P1 9.25 751. 7.76 17.1 1.43e3 0.1 75.7 0.529 ## # … with 40 more rows, and 27 more variables: Ba <dbl>, Be <dbl>, ## # Ca <dbl>, Cd <dbl>, Co <dbl>, COT <dbl>, Cr <dbl>, Cu <dbl>, Fe <dbl>, ## # K <dbl>, Mg <dbl>, Mn <dbl>, Mo <dbl>, Na <dbl>, NH3 <dbl>, Ni <dbl>, ## # Ptot <dbl>, Pb <dbl>, MES <dbl>, Sb <dbl>, Se <dbl>, Sn2 <dbl>, ## # Tl <dbl>, U <dbl>, V <dbl>, Zn <dbl>, geometry <POINT [m]>
ruisso_buf <- st_buffer(st_geometry(ruisso_proj), dist = 500) # add an attribute for sample points id ruisso_buf <- st_sf(sample_pts = ruisso_mean$sample_pts, ruisso_buf) plot(st_geometry(ruisso_proj), pch = 19, cex= .5) plot(st_geometry(ruisso_buf), add = T, border= "blue3")

Re-project the courseau object using the land use projection and name it courseau_proj
Create a 300m buffer around each watercourse and name it courseau_buf. Plot the resulting geometry with courseau_proj on top.
Try st_union() on courseau_buf. Plot the resulting geometry and compare with courseau_buf.
Try st_centroid() on ruisso_buf. Plot the resulting geometry with ruisso_buf under.
For more geometric operations such as
st_distance(),st_convex_hull(),st_intersection()see sf vignette #3
# transform projection courseau_proj <- st_transform(courseau, crs = st_crs(LU_val)) # create buffer courseau_buf <- st_buffer(st_geometry(courseau_proj), dist = 300) # add an unique ID to your buffer courseau_buf <- st_sf(ID = courseau_proj$OBJECTID_1, courseau_buf)
plot(st_geometry(courseau_buf), col=NA, border="black", main = "st_buffer") plot(courseau_proj["TYPE"], lwd = 1.2, add = T)
# compute buffer buf_union <- st_union(courseau_buf) plot(st_geometry(courseau_buf), col=NA, border="black", main = "st_buffer") plot(st_geometry(buf_union), col=NA, border="black", main = "st_union")
# compute centroid ruisso_centroid <- st_centroid(ruisso_buf) plot(st_geometry(ruisso_buf), col=NA, border="black", main = "st_centroid") plot(st_geometry(ruisso_centroid), pch = 19, col="red", add = T)
## Warning in st_centroid.sf(ruisso_buf): st_centroid assumes attributes are ## constant over geometries of x
LU_reclas <- LU_val %>%
dplyr::select(ID, UTIL_SOL) %>%
mutate(UTIL_SOL = case_when(UTIL_SOL %in% c(100:114) ~ "resid",
UTIL_SOL %in% c(200:520) ~ "public_build",
UTIL_SOL == 600 ~ "green",
UTIL_SOL %in% c(700:760) ~ "road",
UTIL_SOL == 800 ~ "agri",
UTIL_SOL == 900 ~ "vacant",
UTIL_SOL == 1000 ~ "water",
UTIL_SOL == 1100 ~ "golf"))
case_whenallows you to vectorize multiple if and else if statements
Now we aim at finding the proportion of area per buffer occupied by different land-use types. st_intersection() can be used to obtain the intersection of two geometries (i.e. the area covered by both).
LU_inters <- st_intersection(LU_reclas, ruisso_buf)
## Warning: attribute variables are assumed to be spatially constant ## throughout all geometries
Warning: attribute variables are assumed to be spatially constantAttribute values are assigned to sub-geometries; if these are spatially constant, as for instance for land use, then this is fine. If they are aggregates, such as population count, then this is wrong.
plot(LU_inters["UTIL_SOL"], key.pos = 1)
# add in a column and compute area (area of each land use poly in intersect layer) LU_inters$areaLU <- st_area(LU_inters) # group data by sample points and land use type and calculate the total area for each land use per buffer LU_area <- LU_inters %>% group_by(sample_pts, UTIL_SOL) %>% summarise(areaLU = sum(areaLU)) # remove geometry st_geometry(LU_area) <- NULL head(LU_area)
## # A tibble: 6 x 3 ## # Groups: sample_pts [1] ## sample_pts UTIL_SOL areaLU ## <fct> <chr> <S3: units> ## 1 AAO-0.0 agri 1211.933 [m^2] ## 2 AAO-0.0 green 353192.022 [m^2] ## 3 AAO-0.0 public_build 5908.185 [m^2] ## 4 AAO-0.0 resid 29445.815 [m^2] ## 5 AAO-0.0 road 38497.788 [m^2] ## 6 AAO-0.0 vacant 110014.028 [m^2]
# buffer with 500m radius so area is pi*500^2 # try st_area(ruisso_buf) # compute proportion for each land use LU_area$propLU <- as.numeric(LU_area$areaLU/(pi*500^2)) # transform to wide format and create a new data frame containing our new landscape variables env_var <- dcast(data = LU_area, formula = sample_pts ~ UTIL_SOL, value.var = "propLU", fill = 0)
rasterraster for faster computationcrop() and mask() reduce object memory use and therefore computational time for raster manipulations.
crop() will decrease the extent of a raster, mask() will set to NA values outside a polygon boundary.
Let’s crop and mask the altitude raster using the buffer MULTIPOLYGON
# we have to reproject ruisso_buf to fit the raster and then convert it to a sp object alt_crop <- crop(altCAN, as(st_transform(ruisso_buf, 4326), "Spatial")) alt_mask <- mask(alt_crop, st_transform(ruisso_buf, 4326))
You could try to
crop()andmask()the altitude raster using the Quebec polygon.
raster for faster computationpar(mfrow=c(1,3)) plot(altCAN) plot(alt_crop) plot(alt_mask) plot(st_geometry(ruisso_buf), add = T)
rasterThe projection of alt_crop is in longitude/latitude.
alt_crop
## class : RasterLayer ## dimensions : 34, 53, 1802 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : -73.94167, -73.5, 45.41667, 45.7 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## data source : in memory ## names : CAN_msk_alt ## values : 0, 179 (min, max)
We can use projectRaster() to transform the CRS of one spatial object to match another spatial object
st_crs(LU_val)[[2]] # to retrieve the proj4string
## [1] "+proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
alt_proj <- projectRaster(alt_crop, crs = st_crs(LU_val)[[2]]) alt_proj
## class : RasterLayer ## dimensions : 40, 63, 2520 (nrow, ncol, ncell) ## resolution : 651, 926 (x, y) ## extent : 266977.8, 307990.8, 5028068, 5065108 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## data source : in memory ## names : CAN_msk_alt ## values : 0.8431883, 174.3307 (min, max)
Compute the mean altitude per buffer
sample_alt <- raster::extract(alt_proj, as(ruisso_buf, "Spatial"), fun=mean, na.rm=TRUE) head(sample_alt)
## [,1] ## [1,] 23.44361 ## [2,] 28.08543 ## [3,] 28.08543 ## [4,] 28.12213 ## [5,] 30.03564 ## [6,] 30.03564
# Add this new environmental variables to our data frame env_var <- cbind.data.frame(env_var, altitude = sample_alt)
Now that we have a clean data frame with water quality variables (mean measurements of water quality in different watercourses) and a data frame of landscape variables of interest (% of different land use types in a 500m buffer and mean altitude in a 500m buffer), we could perform various statistical analyses.
For instance, we could run a Redundancy Analysis (RDA) to study the influence of the landscape on water quality. If we had data on macroinvertebrate abundance, we could study the relative importance of the landscape and water quality on their distribution using variation partitioning on RDA.
plot() allows combination of multiple layers of information in a same graph using add = T
plot(canopee_mtl) plot(st_geometry(courseau_proj), col = "#034ebf", lwd = 1.5, add = T) plot(st_geometry(ruisso_proj), pch = 19, col = "#B00C0C", add = T)
Change the color palette for the raster levels. There are 5 levels and only 2 are pertinent:
myPal <- c("white", "white", "#BAE4B3", "#006D2C", "white") # color palette
plot(canopee_mtl, col = myPal, legend = F) # canopy raster
plot(st_geometry(subset(LU_val, UTIL_SOL==1000)), # UTIL_SOL==1000 -> rivers
col = "#7eb0fc", border = "#7eb0fc",add=T)
plot(st_geometry(courseau_proj), col = "#034ebf", lwd = 1.5, add = T) # watercourse
plot(st_geometry(ruisso_proj), pch = 19, col = "#B00C0C", add = T) # sample points
legend("bottomright", legend = c("Low vegetation", "Canopy"), # legend
fill = c("#BAE4B3", "#006D2C"), border="white", bty = "n")
Add an inset
par(mar=c(3,3,2,0))
plot(canopee_mtl, col = myPal, legend = F) # canopy raster
plot(st_geometry(subset(LU_val, UTIL_SOL == 1000)), # UTIL_SOL==1000 -> rivers
col = "#7eb0fc", border = "#7eb0fc", add = T)
plot(st_geometry(courseau_proj), col = "#034ebf", lwd = 1.5, add = T) # watercourse
plot(st_geometry(ruisso_proj), pch = 19, col = "#B00C0C", add = T) # sample points
legend("bottomright", legend = c("Low vegetation", "Canopy"), # legend
fill = c("#BAE4B3", "#006D2C"), border="white", bty = "n")
par(fig = c(0.08, 0.6, 0.42, 1), new = T) # add inset at specified coordinates of the figure region
plot(st_geometry(qc_simple_prj), col = "grey85", bgc = "transparent") # Quebec polygon
points(286543, 5038936, pch = 17, col = "#B00C0C") # point on Montreal
mapviewJoin our environmental variables data frame as attributes to the sf sample points
ruisso_env <- left_join(ruisso_proj, env_var, by = "sample_pts")
## Warning: Column `sample_pts` joining character vector and factor, coercing ## into character vector
mapview(ruisso_env, zcol = 'COLI', legend=TRUE, layer.name = "E. coli density")
mapview(ruisso_env, cex = "public_build", map.types = "Esri.WorldImagery")
myPal <- colorRampPalette(brewer.pal(9, "YlGnBu"))
mapview(ruisso_env, zcol = "resid", cex = "pH", legend = TRUE, col.regions = myPal,
layer.name = "% of residential area")

Good tutorials for spatial data in R
sf manipulations
Maps in R
Get free data